Nonlinear low-dimensional regression using auxiliary coordinates
نویسندگان
چکیده
When doing regression with inputs and outputs that are high-dimensional, it often makes sense to reduce the dimensionality of the inputs before mapping to the outputs. Much work in statistics and machine learning, such as reduced-rank regression, sliced inverse regression and their variants, has focused on linear dimensionality reduction, or on estimating the dimensionality reduction first and then the mapping. We propose a method where both the dimensionality reduction and the mapping can be nonlinear and are estimated jointly. Our key idea is to define an objective function where the low-dimensional coordinates are free parameters, in addition to the dimensionality reduction and the mapping. This has the effect of decoupling many groups of parameters from each other, affording a far more effective optimization than if using a deep network with nested mappings, and to use a good initialization from sliced inverse regression or spectral methods. Our experiments with image and robot applications show our approach to improve over direct regression and various existing approaches. We consider the problem of low-dimensional regression, where we want to estimate a mapping between inputs x ∈ Rx and outputs y ∈ Ry that are both continuous and high-dimensional (such as images, or control commands for a robot), but going through a lowdimensional, or latent, space z ∈ Rz : y = g(F(x)), where z = F(x), y = g(z) and Dz < Dx, Dy. In some situations, this can be preferable to a direct (full-dimensional) regression y = G(x), for example if, in addition to the regression, we are interested in obtaining a low-dimensional representation of x for its own sake Appearing in Proceedings of the 15 International Conference on Artificial Intelligence and Statistics (AISTATS) 2012, La Palma, Canary Islands. Volume XX of JMLR: W&CP XX. Copyright 2012 by the authors. (e.g. visualization or feature extraction). Even when the true mapping G is not low-dimensional, using a direct regression requires many parameters (DxDy in linear regression) and their estimation may be unreliable with small sample sizes. Using a low-dimensional composite mapping g ◦ F with fewer parameters can be seen as a form of regularization and lead to better generalization with test data. Finally, a common practical approach is to reduce the dimension of x independently of y, say with principal component analysis (PCA), and then solve the regression. However, the latent coordinates z obtained in this way do not necessarily preserve the information that is needed to predict y. This is the same reason why one would use linear discriminant analysis rather than PCA to preserve class information. We want low-dimensional coordinates z that eliminate information in the input x that is not useful to predict the output y, in particular to reduce noise. In this sense, the problem can be seen as supervised dimensionality reduction. Consider then the problem of least-squares regression (although our arguments should apply to other loss functions). The simplest approach to estimate the dimensionality reduction mapping F and the regression mapping g is to minimize the objective function
منابع مشابه
The Role of Dimensionality Reduction in Classification
Dimensionality reduction (DR) is often used as a preprocessing step in classification, but usually one first fixes the DR mapping, possibly using label information, and then learns a classifier (a filter approach). Best performance would be obtained by optimizing the classification error jointly over DR mapping and classifier (a wrapper approach), but this is a difficult nonconvex problem, part...
متن کاملThe role of dimensionality reduction in linear classification
Dimensionality reduction (DR) is often used as a preprocessing step in classification, but usually one first fixes the DR mapping, possibly using label information, and then learns a classifier (a filter approach). Best performance would be obtained by optimizing the classification error jointly over DR mapping and classifier (a wrapper approach), but this is a difficult nonconvex problem, part...
متن کاملA fast, universal algorithm to learn parametric nonlinear embeddings
Nonlinear embedding algorithms such as stochastic neighbor embedding do dimensionality reduction by optimizing an objective function involving similarities between pairs of input patterns. The result is a low-dimensional projection of each input pattern. A common way to define an out-of-sample mapping is to optimize the objective directly over a parametric mapping of the inputs, such as a neura...
متن کاملSpectral curve, Darboux coordinates and Hamiltonian structure of periodic dressing chains
A chain of one-dimensional Schrödinger operators is called a “dressing chain” if they are connected by successive Darboux transformations. Particularly interesting are periodic dressing chains; they include finite-band operators and Painlevé equations as a special case. We investigate the Hamiltonian structure of these nonlinear lattices using V. Adler’s 2× 2 Lax pair. The Lax equation and the ...
متن کاملThree Dimensional Analysis of Flow Past a Solid-Sphere at Low Reynolds Numbers with the Aid of Body Fitted Coordinates
In this paper, the flow-field of an incompressible viscous flow past a solid-sphere at low Reynolds numbers (up to 270) is investigated numerically. In order to extend the capabilities of the finite volume method, the boundary (body) fitted coordinates (BFC) method is used. Transformation of the partial differential equations to algebraic relations is based on the finite-volume method with coll...
متن کامل